Release 10.1A: OpenEdge Data Management:
SQL Development
Optimizer phases
The optimization process is divided into several phases. Some phases deal with internal infrastructure, such as minimizing data handling or temp-table usage. Others deal with significant cost factors and are straightforward to understand. Each phase addresses a specific type of optimization:
The optimizer follows a cost-based model. In each stage, whenever multiple alternatives are available, the optimizer estimates the cost for each and selects the cheapest. The cost computation takes into account:
The following sections provide details on the optimization phases.
Pushing restrict operations close to the data origin
This stage consists of moving restrict operators as far down the query tree as possible. This reduces the number of tuples moving up the tree for further processing and minimizes the amount of data handled. When restrict operations on a join node cannot be moved below the join node, they are set as join conditions. When multiple predicates are moved down the tree to the same relative position, they are reassembled into a single restrict operation, as shown in Example 10–1.
Example 10–1: Query statement optimization
SELECT Name FROM Employee
WHERE Salary > 4000 AND Salary <= 6000
AND Employee.DeptNum = Department.DeptNum;
The optimizer takes the input tree and transforms it as shown below. The restrictions
Salary > 4000 and Salary <= 6000are moved down the tree, below the join node, since they apply to a single table. The restrictionEmployee.DeptNum = Department.DeptNum, applying to two tables, stays above the join node.Using indexes for restrictions
This optimization phase consists of recognizing those cases where an existing index can be used to evaluate a restriction and converting a table scan into an index bracket or set of contiguous index entries. An index bracket is extremely effective in limiting the number of rows that must be processed.
Choosing the best index
To choose an index, the optimizer performs several stages. These stages determine whether an index can be used to process a restrict operation and, if there are multiple indexes to choose from, which index will be used:
Predicate expressions
When the predicates of an SQL statement use the
ORlogical operator to combine expressions that compare the same column with a constant, the optimizer converts these expressions to a singleINpredicate. The purpose of these transformations is, where possible, to combine multiple predicates into a single predicate for simpler evaluation in this and later stages.Similarly, a
LIKEpredicate on an index key, where theLIKEpattern has a prefix of fixed characters, is converted to aBETWEENpredicate.Generating candidate indexes
For every predicate in an SQL statement, the optimizer checks to see if there are indexes that include the columns referenced in the predicate.
Once the optimizer knows which indexes exist on the relevant tables, it generates a list of all the possible index predicates that could be used. For each predicate for which there is an index, the optimizer checks whether:
Selecting an index
When the list of candidate index predicates has been determined, the optimizer selects which, if any, it will use for an index scan operation.
This selection is cost-based. The optimizer computes the cost for each of the index candidates and the cost for a table scan using the default index. The candidate with the lowest cost is chosen.
Providing index hints
You can specify an index for each table in the
FROMclause of aSELECTquery.
index_valis a string that indicates the name of the index.If a candidate plan is generated with the specified index, the optimizer will use it. If the optimizer is unable to generate a candidate plan with the specified index, it ignores the hint.
Join optimization
There are two distinct optimization tasks done as part of the join optimization stage:
Determining join order among adjacent join nodes
After identifying a set of adjacent join nodes, the optimizer uses the available statistics to estimate the cardinality (the number of rows in the table or intermediate result) and selectivity (percentage of rows a predicate returns) for each subtree of the join nodes. It then uses the following criteria to determine the join order:
- The subtree with the lowest estimated cardinality is taken first. The SQL engine’s cost manager estimates the cardinality of each subtree by multiplying table cardinality by the selectivity of the predicates applied to the table.
- The subtree that has the lowest estimated join cardinality (number of output rows produced by a join with the first subtree) is taken second. When determining join cardinality, the optimizer considers whether there is a join condition between the two subtrees. It gives preference to subtree pairs that have join conditions.
- The subtree with the next lowest estimated join cardinality is taken next, and so on.
Choosing the join algorithm
Once the join order has been established, each join node is analyzed to select from among the following algorithms:
The optimizer generates, when possible, candidates for each algorithm. For each join node, candidates are generated by:
Once a set of candidates exists, the optimizer selects the least costly candidate.
Augmented nested loop join
The augmented nested loop (ANL) is by far the most common join method. An augmented nested loop join is performed by doing a scan over the left subtree and for each row in it, performing an index bracket scan on a portion of the right subtree. The right subtree is read as many times as there are rows in the left subtree.
To be a candidate for an ANL join, the subtree pair for a join node must meet the following criteria:
When an ANL join is possible on several indexes, the least-cost index is chosen.
When there is an index defined on the left subtree’s table instead of on the right, the optimizer analyzes the cost of swapping the subtrees to make an ANL join possible.
When neither subtree’s table has an index defined on the join column, the optimizer analyzes the cost of creating a dynamic index on one or both of the subtrees.
Merge join
A merge join is performed by opening simultaneous scans on both the left and right subtrees. Each row that satisfies the join condition is output by the join algorithm. Depending upon the result of the join column comparison, either the left or right scan pointer is advanced. The left and right subtrees are each read once. A merge join is almost never chosen because its cost invariably exceeds an ANL join.
Nested loop join
A nested loop join is performed by doing a scan over the left subtree and for each row in it performing a full scan of the right subtree.
This is the default join algorithm, which can be used for any join. However, it is usually less efficient than the other methods. Usually, either an existing index or a dynamic index, used in an ANL join, will cost much less. Occasionally, when subtree cardinalities are very low, possibly because of index bracketing, nested loop will be the method with the least cost.
Sort optimization
The optimizer performs two optimizations designed to avoid sort operations. The first optimization is to eliminate redundant sorts. The second optimization is to convert table scans into index bracket scans.
Eliminating redundant sorts
The optimizer checks whether the query tree contains unnecessary sort nodes. For example, when an SQL statement contains both a
GROUP BYclause and anORDER BYclause that refers to the same column, at most one sort is needed.A sort node is also redundant when the immediate descendant node of the sort node is an index bracket scan on the sort column. That is, the sort is redundant when the data input to the sort was read using an index with the needed sort order.
Converting table scans to index bracket scans
When a leaf node of a subtree is a table scan, the optimizer checks whether any indexes that exist on the table match the sort columns. If so, it analyzes the cost of each possible index bracket scan and compares the least of those with the sum of the cost of the table scan and sort operation.
If the analysis shows an index bracket scan as having less cost than the table scan and sort operation, the optimizer converts the table scan to the index bracket scan and removes the sort node.
Indexes to evaluate MAX/MIN functions
This stage of optimization examines subtrees that contain
MINandMAXaggregate functions. The optimizer checks if any index on the table matches the column specified in the function. If so, it replaces the table scan at the leaf node with an index bracket scan.The index bracket scan looks up the first or last value of the relevant index key. The first and last values represent the
MINandMAXvalues, respectively, for ascending indexes, and theMAXandMINvalues for descending indexes.Evaluating aggregate functions without fetching the table rows is not possible for indexed character columns because index entries for character data contain the “sort-weight” form of the column value, not the actual column value.
Index bracket scan optimization
This stage checks whether a table scan can be replaced by an index bracket scan. This is possible when a subtree meets the following criteria:
|
Copyright © 2005 Progress Software Corporation www.progress.com Voice: (781) 280-4000 Fax: (781) 280-4095 |